Introduction

Advanced Geospatial Data Processing for Social Scientists

Dennis Abel & Stefan Jünger

2025-04-28

The goal of this course

This course will teach you how to exploit R and apply its geospatial techniques in a social science context.

By the end of this course, you should…

  • Be comfortable with using raster data in R
  • Including importing and wrangling raster layers
  • Be able to create maps based on your very own processed data in R

Illustration by Allison Horst

We are (necessarily) selective

There’s a multitude of spatial R packages

  • We cannot cover all of them
  • And we cannot cover all functions
  • You may have used some we are not familiar with

We will show the use of packages we exploit in practice

  • There’s always another way of doing things in R
  • Don’t hesitate to bring up your solutions

You can’t learn everything at once, but you also don’t have to!

Prerequisites for this course

  • Good knowledge of R, its syntax, and internal logic
  • Affinity for using script-based languages
  • Knowledge of fundamentals of geospatial data wrangling and analysis
  • Don’t be scared to wrangle data with complex structures
  • Working versions of R (and Rstudio) on your computer

About us (Stefan)

  • Senior Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in Social Sciences, University of Cologne
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Social inequalities
    • Attitudes towards minorities
    • Environmental attitudes
    • Reproducible research

About us (Dennis)

  • Postdoctoral Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in Political Economy, University of Cologne
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Environmental attitudes and behavior
    • Public policy
    • Open source software

About us (Amelie)

  • Intern at the Survey Data Augmentation at the GESIS department Survey Data Curation
  • Undergraduate in Geography, University of Bonn
  • Study Interests:
    • Geographic Information Systems (GIS)
    • Intersectionality of geography and social sciences (e.g. Loss and Damage)
    • Biogeography
    • Climatology
    • Dendrochronology and -ecology

About you

  • What’s your name?
  • Where do you work/research?
  • What are you working on/researching?
  • What is your experience with R or other programming languages?
  • Do you already have experience with geospatial data?

Course schedule

Day Time Title
April 28 10:00-11:15 Introduction
April 28 11:15-11:30 Coffee Break
April 28 11:30-13:00 Raster data in R
April 28 13:00-14:00 Lunch Break
April 28 14:00-15:15 Raster data processing
April 28 15:15-15:30 Coffee Break
April 28 15:30-17:00 Graphical display of raster data in maps
April 29 10:00-11:15 Remote sensing datacubes & access to public APIs
April 29 11:15-11:30 Coffee Break
April 29 11:30-13:00 Advanced datacube processing
April 29 13:00-14:00 Lunch Break
April 29 14:00-15:15 Data integration and linking (with survey data)
April 29 15:15-15:30 Coffee Break
April 29 15:30-17:00 Outlook and open session with own application

Now

Day Time Title
April 28 10:00-11:15 Introduction
April 28 11:15-11:30 Coffee Break
April 28 11:30-13:00 Raster data in R
April 28 13:00-14:00 Lunch Break
April 28 14:00-15:15 Raster data processing
April 28 15:15-15:30 Coffee Break
April 28 15:30-17:00 Graphical display of raster data in maps
April 29 10:00-11:15 Remote sensing datacubes & access to public APIs
April 29 11:15-11:30 Coffee Break
April 29 11:30-13:00 Advanced datacube processing
April 29 13:00-14:00 Lunch Break
April 29 14:00-15:15 Data integration and linking (with survey data)
April 29 15:15-15:30 Coffee Break
April 29 15:30-17:00 Outlook and open session with own application

Relevance of geospatial data in social sciences

A growing interest in economics and the social sciences in geospatial and Earth observation data has led to a broad spectrum of publications in recent years. We have identified four major subject areas which have been addressed with EO data recently:

  • Environmental attitudes and behavior,
  • Economic development and inequality,
  • Conflict and migration,
  • Political behavior.

Relevance of geospatial data in social sciences

Increased amount of available data

  • Quantitative and on a small spatial scale
  • Often open source and free access

Better tools

  • Personal computers with enough horsepower
  • Standard software, such as R, can be used as Geographic Information System (GIS)

Geospatial data in this course I

In the folder called ./data, you can find (most of) the data files prepped for all the exercises and slides. The following data are included:

Geospatial data in this course II

Please make sure that if you reuse any of the provided data to cite the original data sources.

Packages in this course I

We will use plenty of different packages during the course, but only a few are our main drivers (e.g., the sf package). Here’s the list of packages you may need for the exercises:

Packages in this course II

Refresher: What are geospatial data?

Data with a direct spatial reference

\(\rightarrow\) geo-coordinates x, y (and z)

  • Information about geometries
  • Optional: Content in relation to the geometries

Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019

Refresher: What is GIS?

Most common understanding: Geographic Information Systems (GIS) as specific software to process geospatial data for

  • Visualization
  • Analysis
  • Interpretation

Refresher: Data specifics

Sources: OpenStreetMap / GEOFABRIK (2018) and City of Cologne (2014)

Formats

  • Vector data (points, lines, polygons)
  • Raster data (grids)

Coordinate reference systems (CRS)

  • Allow the projection on earth’s surface
  • Differ in precision for specific purposes

Refresher: Types of CRS

You may hear from geographic, geocentric, projected, or local CRS in your research.

What’s the difference?

  • whether 2 dimensional (longitude, latitude) or 3 dimensional (+height) coordinates are used
  • the location of the coordinate system’s origin (center of earth or not)
  • projection on a flat surface (transformation of longitudes and latitudes to x and y coordinates)
  • location (the smaller, the more precise the projections)

In practice, what matters most is that two or more layers match when integrating them.

Refresher: Coordinate reference system (CRS)

  • CRS is a reference system to determine the precise location of points in space
  • GIS programs MUST know CRS for accurate processing, visualization, and analysis of data
  • CRS is based on the Geographic Coordinate System (GCS) + the Projected Coordinate System (PCS)

Refresher: Geographic Coordinate System (GCS)

Necessary to know where exactly on Earth’s surface data is located

  • GCS uses three-dimensional spherical surface to define locations based on datum and latitude and longitude lines
  • Datum: Mathematical model of the Earth that serves as reference point by defining size and shape of Earth
  • Local datum: Optimizes fit for particular location (like NAD83)
  • Geocentric datum: Optimizes fit for entire Earth (like Word Geodetic Survey 1984 - WGS84)
  • WGS84 is standard for GPS and many applications

Source: Caitlin Dempsey

Refresher: Projected Coordinate System (PCS)

Necessary to draw the data on a flat map

  • PCS represents Earth’ surface on a flat plane by mathematical transformations (projections)
  • Coordinate grid: Here we talk about x and y coordinates (= easting and northing)
  • Conversion of degrees of latitude and longitude into measurable units (like meters)

Different projection approaches. Left: Planar, middle: conic, right: cylindrical. Source

Refresher: Common PCS - UTM

Universal Transverse Mercator (UTM) is a global map projection which:

  • Projects globe onto a cylinder tangent to a central meridian
  • Divides it into 60 zones
  • Distortion is minimized within each zone
  • Provides high accuracy for small areas

Source

Refresher: Layers Must Match!

EPSG:3857

EPSG:3035

Source: Statistical Office of the European Union Eurostat (2018) / Jünger, 2019

Refresher: Documentation of CRS

Every geodata object requires a description of the CRS - GCS and datum - PCS - x and y units (like meters) - Domain (maxium allowable x and y values) - Resolution

Refresher: Old standard: PROJ.4 strings

This is how your information about the CRS are defined in a classic standard:

+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

Source: https://epsg.io/3035

(It’s nothing you would type by hand)

Refresher: WKT (“Well Known Text”)


PROJCS["ETRS89 / LAEA Europe",
    GEOGCS["ETRS89",
        DATUM["European_Terrestrial_Reference_System_1989",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6258"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4258"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",52],
    PARAMETER["longitude_of_center",10],
    PARAMETER["false_easting",4321000],
    PARAMETER["false_northing",3210000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AUTHORITY["EPSG","3035"]]

Source: https://epsg.io/3035

Refresher: EPSG Codes

Eventually, working with CRS in R will not be as challenging as it may seem since we don’t have to use PROJ.4 or WKT strings directly.

Most of the time, it’s enough to use so-called EPSG Codes (“European Petroleum Survey Group Geodesy”), a small digit sequence.

Refresher: More details on geospatial data

Let’s learn about geospatial data as we learn about specific formats

Source